Towards Efficient Graph Traversal using a Multi-GPU Cluster
نویسندگان
چکیده
Graph processing has always been a challenge, as there are inherent complexities in it. These include scalability to larger data sets and clusters, dependencies between vertices in the graph, irregular memory accesses during processing and traversals, minimal locality of reference, etc. In literature, there are several implementations for parallel graph processing on single GPU systems but only few for single and multi-node multiGPU systems. In this paper, the prospects of improvement in large graph traversals by utilizing multi-GPU cluster for Breadth First Search algorithm has been studied. In this regard, a DiGPU, a CUDA-based implementation for graph traversal in shared memory multi-GPU and distributed memory multi-GPU systems has been proposed. In this work, an open source software module has also been developed and verified through set of experiments. Further, evaluations have been demonstrated on local cluster as well as on CDER cluster. Finally, experimental analysis has been performed on several graph data sets using different system configurations to study the impact of load distribution with respect to GPU specification on performance of our implementation. Keywords—Graph processing; GPU cluster; distributed graph traversal API; CUDA; BFS; MPI
منابع مشابه
Stackless Multi-BVH Traversal for CPU, MIC and GPU Ray Tracing
Stackless traversal algorithms for ray tracing acceleration structures require significantly less storage per ray than ordinary stack-based ones. This advantage is important for massively parallel rendering methods, where there are many rays in flight. On SIMD architectures, a commonly used acceleration structure is the multi bounding volume hierarchy (MBVH), which has multiple bounding boxes p...
متن کاملData-parallel agent-based microscopic road network simulation using graphics processing units
Road network microsimulation is computationally expensive, and existing state of the art commercial tools use task parallelism and coarse-grained data-parallelism for multi-core processors to achieve improved levels of performance. An alternative is to use Graphics Processing Units (GPUs) and fine-grained data parallelism. This paper describes a GPU accelerated agent based microsimulation model...
متن کاملUnderstanding the SIMD Efficiency of Graph Traversal on GPU
Graph is a widely used data structure and graph algorithms, such as breadth-first search (BFS), are regarded as key components in a great number of applications. Recent studies have attempted to accelerate graph algorithms on highly parallel graphics processing unit (GPU). Although many graph algorithms based on large graphs exhibit abundant parallelism, their performance on GPU still faces for...
متن کاملGPU-Based Parallel Collision Detection for Real-Time Motion Planning
We present parallel algorithms to accelerate collision queries for samplebased motion planning. Our approach is designed for current many-core GPUs and exploits the data-parallelism and multi-threaded capabilities. In order to take advantage of high number of cores, we present a clustering scheme and collision-packet traversal to perform efficient collision queries on multiple configurations si...
متن کاملUsing Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach
While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017